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ABSTRACT 


It is mandatory for every business to have website to be successful. Selling on 
ecommerce websites has been there for long but the task to keep it running to 
meet expectations of business and traffic is always a challenging task. Every 
day in e-commerce is learning day for most of the IT support teams. Some days 
are expected to have more sales and it greatly benefits the business if the IT 
systems are able to take the load of traffic and convert the visitors to sales. In Research and 
this article we will describe the various ways IT teams can prepare for such 


days of anticipated high website traffic. 
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I. INTRODUCTION 

It is mandatory for every business to have website to be 
successful. Selling on ecommerce websites has been there for 
long but the task to keep it running to meet expectations of 
business and traffic is always a challenging task. Every day in 
e-commerce is learning day for most of the IT support teams. 
Some days are expected to have more sales and it greatly 
benefits the business if the IT systems are able to take the 
load of traffic and convert the visitors to sales. Those 
anticipated days can be either a planned flash sale for a day 
or as big as the holiday peak season. There is nothing more 
disheartening for business when they have inventory to sell 
online, users want to buy online but the IT systems are 
down. In this article we will describe the various ways IT 
teams can prepare for such days of anticipated high website 
traffic. 


II. Announcement and Types of Events 

Earlier the notification, better the teams can prepare the 
systems. But that is not practically possible as business 
might have some sudden requests or plans. Also, in-spite of 
knowing well in advance of the holiday season every year, 
many websites are unable to handle the load. Type of 
expected high load days are as below: 


> Impromptu sales - Business might plan for flash sales 
and discounts to improve sales or adjust their inventory. 
It might be also be triggered by competitor's sale events. 
These are the most challenging in terms of schedule of 
IT resources. IT support teams have to pause their 
current tasks on hand and change priory to the sale 
event. It becomes even more difficult if there is another 
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major unresolved issue at same time or planned 
maintenance activities at same time. 


Event based Sales - These types of sales are based on 
certain events happening and outcome of them. For 
example, based on the result of sports event the sale 
items on the website have to be launched with 
appropriate winner merchandise. These events need 
accelerated content, inventory updates and refresh on 
website. Team is expected to react fast based on the 
outcome of event. It is also often related to brand 
prestige if one website is able to refresh the homepage 
and sell content of sports event winners sooner than 
other. 


Calendar Events - These events are fixed days 
throughout out the year and often tagged as seasonal. 
For example: valentine day, father’s day, back to school 
etc. These can be planned ahead and easy to handle if 
orchestrated properly. These calendar event sales often 
give a measure of the expected sales and traffic in peak 
season, like the traffic in peak season can be three times 
the traffic on back-to-school sales etc. 


Holiday Sales / Peak Season - These are special type of 
calendar event days. These started long back as retailers 
giving huge discounts to their shoppers for as means of 
thanking for the sales until then in the year. These sales 
starting from thanksgiving are termed as peak season or 
holiday season sales. They generally start around 
Thanksgiving Day and end around Christmas. These 
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types of sales mostly started in USA but are now Seen in 
almost around the globe in one format or other. In 
recent years, these sales have been starting more early 
like start of November itself. Most retailers make more 
sales and revenue on these days than rest of the year 
combined. 


Il. Meet and Plan 

Irrespective of upcoming events or not, regular meetings 
between IT and business helps IT teams to be aware of such 
upcoming events and business expectations. It also helps 
business to know the calendar of any maintenance or IT 
changes upcoming. Both the parties can plan around the 
dates if there is the conflict. It also provides opportunity to 
evaluate features on website which might not be performing 
well or missing as per business needs. 


Plan and Synchronize —Refer Figure 1. Once a sales event is 
known to be upcoming, planning for it with all stakeholder in 
a common meeting helps every team to adjust their 
priorities. Forgetting to inform certain stakeholder can cause 
major roadblocks during sales event. Thus, it always helps to 
keep the list of teams handy and meet regularly like every 
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week even if there are no events, so that each team is aware 
of each other's changes and schedule and can synch up their 
activities for the big day. Main stakeholders being 
Application Team, Business, Infrastructure and Service Desk. 
Depending on the software products used, third party 
systems like payment systems, tax systems are critical part 
of sale flow and should be included in communication. 
Preparing a checklist of actions for each team and meeting 
again to assess the readiness couple days before the event is 
often very productive. Knowing how much time each IT 
system needs to push changes to next system or to website is 
great metric to be aware of. For example: If product 
information is entered in retail systems (Ex: oracle retail 
/SAP etc) by business, it might take a day or hours for that 
product to make on the website depending on the systems 
process like data enrichment, approval etc. Every company 
has it’s approximate processing time. Inventory once entered 
or updated in source system, after how much time will it 
show on site based on flow from intermediate systems. 
Focusing on flow surrounding content for pages, product 
information, pricing and inventory leads to beneficial results. 


Sales * 
"Frauds and Relurns 


- Figure 1PreparingforSalesEvents = 


IV. Assess and Prepare 


While business works in backend like preparing the merchandise and pricing, rest of the teams needs to do work in parallel to 


assess their systems readiness to handle the sales event. 
> 


Baseline, Alerts and Monitor: Baseline the existing systems is critical part of the process if not done earlier. Every system is 


expected to the know the metrics on normal operations day and how much more can it take without modifications i.e. 
threshold levels. If the expected traffic would cause thresholds to breach, then it is important to reassess and improve the 
infrastructure at earliest. For example, in figure 2 for sample application, it can handle sale event 1 at 85% memory 
utilization, little above P2 alert threshold of 80% but within P1 threshold. But it cannot handle sale event2 with current 
infrastructure. Depending on the type of infrastructure, boosting up the capacity can take weeks (physical servers) or just 
minutes (cloud). Once thresholds and expected deviation are calculated, often the alerts have to be readjusted to have 
effecting monitoring and alerts. For example, on normal day having queue depth more than 100 might be triggering an 
alert, but for the sale day it might be normal to have 200 depth in queue and alerting has to be adjusted accordingly. 


Memory Usage — Application1 
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> Design Patterns to recover - Code quality and system 


stability is continuous process throughout the year and 
sale events would be an opportunity to reassess the 
performance. Applying appropriate architecture design 
patterns are great way to build robust systems. Circuit 
Breaker and Feature Flag patterns are great design 
patterns to be considered while architecting high 
availability website functionalities and flow. Circuit 
Breaker is pattern is to solve the problem statement that 
once an error ina running software then it will continue 
to reoccur until it is fixed and can bring down the entire 
system. If circuit break is implemented, it ensures the 
systems detects error and ensure the corresponding 
flow is disabled or routed to alternate flow 
automatically to avoid reoccurrence of the error until it 
is fixed. Flag pattern is design pattern in which flags are 
used to control flow or feature. This feature is normally 
used to test features on website, but they can be used 
during high traffic days to disable non-critical flows. For 
example: often use flags to run feature like cart popup, 
marketing Flags should be tested to ensure they work as 
needed. If not implemented properly feature flags would 
need server restarts which can eventually bring down 
the system under heavy load and cause more harm. 


Caching - Caching keeps the load off the core 
infrastructure and also renders content to user several 
times faster. It is lifesaver for all the websites 
experiencing high traffic. There are caching at all levels 
of user flow, from api to content delivery systems to 
browser. During sales days, some pages are visited more 
than others. Caching is often tradeoff between staleness 
of page and performance. Careful calculation should be 
done to ensure balance between acceptable staleness of 
page and ability of system to take new requests. 
Avoiding cache clears during middle of the day, helps 
keep the system stable. System restarts and cache 
clearing should be planned during off hours when 
expected traffic is negligible. If business can share the 
urls of website pages planned ot be used for sales most, 
IT term can validate those urls are included in the 
caching url patterns and get the required caching and 
also show up as landing pages for any keyword 
searching. 


Non-Essential Jobs and Logs -On normal day there are 
could tens to hundreds of scheduled jobs in each 
subsystem, pushing or pulling data in between systems. 
These jobs do occupy network bandwidth and also 
system compute and storage. (Refer- Figure 3 and 4). 
Disabling the non-essential jobs helps to free up the 
bandwidths for essential data flow. Common examples 
of such jobs are differential backup services etc. There 
are few jobs, changing of which would need business 
approval like product extract to marketing systems. 


Essential Jobs -Even the essential jobs like flow of 
product data, pricing and inventory run multiples ona 
day. On regular day they might be setup to run 
throughout the day. Keeping the flow of product data 
and pricing only once per day reduces the impact of 
bandwidth on integrations and necessity of cache clears 
also. Inventory should be the key data flow enabled 
throughout the day to ensure the accurate status of 
availability of the product on website. 
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Figure 3 Regular Day (all flows enabled) 
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Figure 4 High Traffic Day (only critical flow) 


Contact List - If there is helpdesk team, that team is 
normally responsible for coordinating the IT issues and 
incidents and maintain the contact list of all internal 
teams and external dependent vendors. If there is not 
service desk team, this is be owned by application 
support teams. In event of issue, time is of essence as 
every minute of downtime is loss of revenue. Thus, 
effective documentation of all required parties contact 
methods and SLAs is highly beneficial. 


People Roster —-Even though systems run on their own, 
constant monitoring and human intervention and cross- 
team synchronization is needed for any successful IT 
systems. Most IT support team maintain daily support 
roster through the year. For the sales days, 
synchronizing the roster between teams, placing all 
teams in common accessible path and _ including 
executives in roster for escalation helps to coordinate 
issues on critical days better. 


Change Freeze - Lesser the changes to system, better the 
Stability of systems. But how many days ahead of the 
sales event should the code freeze be put in place 
depends on how big the sales event is and also the 
maturity of the system and know issues in system. 


Disaster Recovery - All systems now have Disaster 
Recovery plans and systems. But they are not often 
tested and found to be not working as expected when 
there is a real disaster. Testing the Disaster Recovery 
process by intentional chaos or shutdown would help 
evaluate the systems, infrastructure, people readiness 
and mainly organizational objectives of RTO (Recovery 
Time Objective) and RPO (Recovery Pont Objective). 
RTO translates to how much downtime is acceptable. 
RPO roughly translates to how much of chronological 
data loss is acceptable. 


Execute- On the day or during multi-day event, it is 
common practice among retailers to have war room 
style meetings to have all teams in same room to 
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monitor, analyze issues in person with cross teams and 
quickly act on issues. Multiple television screens with 
health dashboards from all systems is key to catch 
negative trends or issues early. Business can demand 
changes to product or pricing based on sales or 
competitors, IT would be executing on such requests. If 
all goes well, it would be just days of counting orders, 
sales and running IT jobs as per the adjusted schedule. 


V. Post Execution 

Sales event can result in scenarios of i) Did not get expected 
users to site ii)Got the expected website visitors but IT 
systems could not handle it iii) Got expected users and sales 
with no downtime or inventory issues. The third scenario 
being the most desired and fruitful for all. Along with sales, 
most retailers also see increase of credit fraud and returns. 
Irrespective of outcome the best thing business and IT can 
take away are the lessons learnt. Noting down the lessons 
learnt from sales events are key input to prepare and access 
for the next sales event. Both, what went well and what did 
not should be captured as soon as possible before the 
exhausted teams sign off the event. 


VI. Conclusion 

No matter how many days or how well the preparation is, 
websites will fail. Technologies like micro services and auto 
scaling in cloud have taken most of the guess work of 
infrastructure sizing, but there would be surprises in one 
way or the other. Big retailers like Best Buy, Macy, Target, 
JCPenneyetc., all had their downtime one day or other. How 
quickly and gracefully they recover from that downtime, 
learn the lessons and implement solutions based on those 
lessons is what is differentiating them more and more. 
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